Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache snapshotting performance improvements #7830

Merged
merged 8 commits into from
Jan 12, 2017
Merged

Conversation

jwilder
Copy link
Contributor

@jwilder jwilder commented Jan 12, 2017

Required for all non-trivial PRs
  • Rebased/mergable
  • Tests pass
  • CHANGELOG.md updated

This PR has has a few changes to improve the cache snapshotting throughput. Under heavy write load, snapshots can start to backup. This can cause 500 to be returned with a cache-max-memory-size error as well as more frequent write timeouts due to the cache being locked for a longer periods of time.

The images below are examples of 1.1 cache and heap memory size under very heavy write load (~2M/s). They were run a c4.8xlarge (36 core/ 64 GB RAM) with both 100k series and 1M series.

The saw tooth pattern below is the snapshots getting progressively larger and taking longer to complete. At the peaks, write errors for cache-max-memory-size being exceeded could be returned depending on the configuration of the limit. When they get larger, timeouts also start to occur when compactions kick in and use up a CPU.

screen shot 2017-01-12 at 8 38 28 am

Heap also continues to grow in this case:
screen shot 2017-01-12 at 8 42 37 am

With this PR, the same write load now looks like:

screen shot 2017-01-12 at 8 45 11 am

screen shot 2017-01-12 at 8 45 41 am

A longer 10B point run across 1M series looks similar whereas 1.1 would start hitting 500s fairly quickly.

screen shot 2017-01-12 at 8 47 53 am

In both cases, the cache size stayed well under the default 1GB max memory size.

The actual changes in this PR are the following:

  • Cache.Snapshot was simplified to just swap the hot cache with the snapshot cache instead of iterating of very key and copy values. This reduces lock contention with the cache is write locked.
  • CacheKeyIterator encodes keys in parallel. Previously, the writing of a snapshot was single thread and would iterate over each key in order and encode all the values. This now encodes all the keys concurrently while the writer goroutine writes them out to disk in order as soon as the encoded blocks are ready. The makes snapshotting leverage all available cores now.
  • Fix cache size getting incremented to early. If the cache limit was exceeded, the cache size counter would continue to grow with every failed write. When the snapshot completed, it would sometimes go negative.
  • Fix an issue where compaction scheduling would sometime stop. I'm not exactly sure how this happened locally, but the cache compactions for a shard just got stuck on a Timer.C channel that never fired. This code was reusing a Timer incorrectly according to the docs so it has been switch to a Ticker instead.

With these changes, the chances of getting a cache-max-memory-size limit error decrease significantly. Timeouts can still occur depending on the system load so increasing write-timeout may still be needed in some cases. The timeout should occur much less frequently though.

This simplifies the cache.Snapshot func to swap the hot cache to
the snapshot cache instead of copy and appending entries.  This
reduces the amount of time the cache is write locked which should
reduce cache contention for the read only code paths.
The CacheKeyIterator (used for snapshot compactions), iterated over
each key and serially encoded the values for that key as the TSM
file is written.  With many series, this can be slow and will only
use 1 CPU core even if more are available.

This changes it so that the key space is split amongst a number of
goroutines that start encoding all keys in parallel to improve
throughput.
The memory stats as well as the size of the cache were not accurate.
There was also a problem where the cache size would be increased
optimisitically, but if the cache size limit was hit, it would not
be decreased.  This would cause the cache size to grow without
bounds with every failed write.
I ran into an issue where the cache snapshotting seemed to stop
completely causing the cache to fill up and never recover.  I believe
this is due to the the Timer being reused incorrectly.  Instead,
use a Ticker that will fire more regularly and not require the resetting
logic (which was wrong).
@jwilder jwilder added this to the 1.2.0 milestone Jan 12, 2017
@jwilder jwilder requested a review from e-dard January 12, 2017 16:02
}

func (c *cacheKeyIterator) encode() {
concurrency := runtime.NumCPU()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be runtime.GOMAXPROCS(0) instead of runtime.NumCPU, to respect the GOMAXPROCS environment setting?

As a side note, there's a few other places in the tsm1 package where we're using NumCPU. I think those ought to all be GOMAXPROCS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used NumCPU because it sounds like GOMAXPROCS might go away at some point: https://golang.org/pkg/runtime/#GOMAXPROCS. I also assumed that NumCPU is the same as GOMAXPROCS(0). That might be incorrect though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like NumCPU is different than GOMAXPROCS(0). Will update to use GOMAXPROCS(0).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If and when it does go away, gofix will be updated to minimize the effort to update our code.

The GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously.

There may be situations where someone would want to set it lower, possibly in conjunction with taskset to deliberately limit how many CPUs influxdb can use. In other situations, we as developers may want to increase GOMAXPROCS to shake out contention issues.

wg.Done()
}()
}
wg.Wait()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this method is only called in its own goroutine, the WaitGroup seems unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will update.

@e-dard
Copy link
Contributor

e-dard commented Jan 12, 2017

@jwilder I've added 73ed864

@jwilder jwilder merged commit 7e7f0d0 into master Jan 12, 2017
@jwilder jwilder deleted the jw-cache-snapshot branch January 12, 2017 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants